Avoid deep copy on lz4 decompression #7437

crusaderky · 2022-12-28T12:22:03Z

Partially closes Deserialization of compressed data is sluggish and causes memory flares #7433

Speed up deserialization when
a. lz4 is installed, and
b. the buffer is compressible, and
c. the buffer is smaller than 64 MiB (distributed.comm.shard)
Note that the default chunk size in dask.array is 128 MiB.

Note that this does not prevent a memory flare, as there's an unnecessary deep copy upstream as well:
https://github.com/python-lz4/python-lz4/blob/79370987909663d4e6ef743762768ebf970a2383/lz4/block/_block.c#L256

github-actions · 2022-12-28T13:56:54Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      22 files ±  0       22 suites ±0 9h 57m 15s ⏱️ - 23m 5s
  3 283 tests +  5   3 196 ✔️ +18     85 💤 ±  0 2 ❌ - 13
36 044 runs +55 34 501 ✔️ +93 1 541 💤 +24 2 ❌ - 62

For more details on these failures, see this check.

Results for commit 9b14c35. ± Comparison against base commit f3995b5.

♻️ This comment has been updated with latest results.

mrocklin

In principle this seems fine. I did have a couple of small questions though.

mrocklin · 2022-12-28T17:14:06Z

distributed/protocol/tests/test_protocol.py

    x = np.arange(1000000, dtype="int64")
    compression, payload = maybe_compress(x.data)
-    assert compression == "lz4"
+    assert compression in {"lz4", "snappy", "zstd", "zlib"}


I think I would be sad if we used zlib by default in any configuration. I'll bet that it's faster to just send data uncompressed over the network.

Ah, you're right. I misread the code; default compression is lz4 -> snappy -> None.
I've amended the tests and added a specific test for the priority order.

mrocklin · 2022-12-28T17:15:34Z

distributed/protocol/tests/test_numpy.py

@@ -217,7 +217,6 @@ def test_itemsize(dt, size):


 def test_compress_numpy():
-    pytest.importorskip("lz4")


I'm curious, why this change? If we didn't have lz4, snappy, or zstandard installed (all of which are optional I think) then I'd expect this to fail.

The only compressor we have by default, I think, is zlib, and we don't compress with that by default.

Actually, if you have snappy but not lz4 it will succeed.
zstandard does not install itself as a default compressor.
Amended the tests to reflect this.

Avoid deep copy on lz4 decompression

d450e1b

crusaderky self-assigned this Dec 28, 2022

crusaderky added the memory label Dec 28, 2022

[skip-caching]

b6a3228

mrocklin reviewed Dec 28, 2022

View reviewed changes

crusaderky added 2 commits December 28, 2022 23:38

Fix tests

682833f

[skip-caching]

9b14c35

mrocklin merged commit 875207b into dask:main Dec 29, 2022

crusaderky deleted the lz4_deepcopy branch December 29, 2022 17:45

crusaderky mentioned this pull request Dec 30, 2022

Deserialization of compressed data is sluggish and causes memory flares #7433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid deep copy on lz4 decompression #7437

Avoid deep copy on lz4 decompression #7437

crusaderky commented Dec 28, 2022 •

edited

Loading

github-actions bot commented Dec 28, 2022 •

edited

Loading

mrocklin left a comment

mrocklin Dec 28, 2022

crusaderky Dec 28, 2022

mrocklin Dec 28, 2022

crusaderky Dec 28, 2022

		@@ -217,7 +217,6 @@ def test_itemsize(dt, size):


		def test_compress_numpy():
		pytest.importorskip("lz4")

Avoid deep copy on lz4 decompression #7437

Avoid deep copy on lz4 decompression #7437

Conversation

crusaderky commented Dec 28, 2022 • edited Loading

github-actions bot commented Dec 28, 2022 • edited Loading

Unit Test Results

mrocklin left a comment

Choose a reason for hiding this comment

mrocklin Dec 28, 2022

Choose a reason for hiding this comment

crusaderky Dec 28, 2022

Choose a reason for hiding this comment

mrocklin Dec 28, 2022

Choose a reason for hiding this comment

crusaderky Dec 28, 2022

Choose a reason for hiding this comment

crusaderky commented Dec 28, 2022 •

edited

Loading

github-actions bot commented Dec 28, 2022 •

edited

Loading